Journal Club

Jay Brophy MD PhD

Departments of Medicine, Epidemiology and Biostatistics, McGill University

2025-04-26

STRIDE Trial Overview

STRIDE was a double-blind, randomised, placebo-controlled trial done at 112 outpatient clinical trial sites in 20 countries in North America, Asia, and Europe. Participants were aged 18 years and older, with type 2 diabetes and peripheral artery disease with intermittent claudication.

Participants were randomly assigned (1:1) using an interactive web response system to receive subcutaneous semaglutide 1·0 mg once per week for 52 weeks or placebo.  

The primary endpoint was the ratio to baseline of the maximum walking distance at week 52 measured on a constant load treadmill in the full analysis set.

What are the results of the trial?

# A tibble: 2 × 2
  group        mean
  <chr>       <dbl>
1 Placebo      18.3
2 Semaglutide  36.7
[1] 18.45558
[1] 9.035466e-13

Are the results clinically meaningful?

Risk of bias assessment - what interests us today

Example RoB2.0 tool for randomized controlled trials

library(robvis)
rob_summary(data_rob2, tool = "ROB2")

Risk of bias assessment - what interests us today

Example RoB2.0 tool for randomized controlled trials  Traffic lights format

rob_traffic_light(data_rob2[1:3,], tool = "ROB2")

Why Change Scores Can Be Misleading

  • Regression to the mean (RTM) occurs when participants with unusually low (or high) baseline values tend to shift toward the average on repeated measurements — even without any intervention.
  • If the treatment group has low observed baseline values due to measurement noise, their follow-up scores may appear to improve more than they actually did.
  • Altman (2001)(Vickers and Altman 2001) and others have shown that comparing change scores or ratios to baseline across groups introduces bias when baseline values are imbalanced or noisy (Vickers and Altman 2001).
  • The correct approach is ANCOVA model the follow-up outcome and adjust for the baseline as a covariate.
  • This preserves randomization and avoids overestimating treatment effects.

Simulating RTRM that Inflates the Effect

  • The following simulation mimics the STRIDE trial (Investigators 2024), which used a 6-minute walk test to measure the effect of semaglutide on walking distance 
  • The true treatment effect is 10 m 
  • The observed baseline is biased due to measurement error 
  • The observed follow-up is the true baseline plus the treatment effect plus noise 
  • The naive analysis (change score) will overestimate the treatment effect 
  • The correct analysis (ANCOVA) will adjust for the baseline and provide a more accurate estimate of the treatment effect 
# A tibble: 792 × 6
   group       treat baseline_true baseline_obs followup change
   <chr>       <dbl>         <dbl>        <dbl>    <dbl>  <dbl>
 1 Semaglutide     1          166.         142.     147.   5.02
 2 Semaglutide     1          171.         142.     148.   5.98
 3 Semaglutide     1          174.         149.     182.  33.1 
 4 Semaglutide     1          200.         213.     194. -18.8 
 5 Semaglutide     1          222.         211.     249.  38.1 
 6 Semaglutide     1          200.         196.     206.   9.19
 7 Semaglutide     1          163.         139.     167.  28.3 
 8 Semaglutide     1          172.         174.     178.   4.14
 9 Semaglutide     1          153.         131.     168.  37.4 
10 Semaglutide     1          170.         139.     158.  18.8 
# ℹ 782 more rows

Published analyses

Naive Analysis (Wrong)  

[1] 19.62093
[1] 1.727469e-20

Correct Analysis: ANCOVA  


Call:
lm(formula = followup ~ treat + baseline_obs, data = df)

Residuals:
    Min      1Q  Median      3Q     Max 
-91.273 -16.333   0.104  15.567  91.224 

Coefficients:
              Estimate Std. Error t value Pr(>|t|)    
(Intercept)  101.10181    5.83737  17.320  < 2e-16 ***
treat         13.68387    1.78125   7.682 4.64e-14 ***
baseline_obs   0.46234    0.03083  14.995  < 2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 24.6 on 789 degrees of freedom
Multiple R-squared:  0.2398,    Adjusted R-squared:  0.2379 
F-statistic: 124.5 on 2 and 789 DF,  p-value: < 2.2e-16

Visualization: Inflation of Treatment Effect

Posterior (Bayesian) Summary

Running MCMC with 4 sequential chains...

Chain 1 finished in 1.3 seconds.
Chain 2 finished in 1.5 seconds.
Chain 3 finished in 1.5 seconds.
Chain 4 finished in 1.7 seconds.

All 4 chains finished successfully.
Mean chain execution time: 1.5 seconds.
Total execution time: 6.5 seconds.
# A tibble: 1 × 10
  variable  mean median    sd   mad    q5   q95  rhat ess_bulk ess_tail
  <chr>    <dbl>  <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>    <dbl>    <dbl>
1 adj_diff  13.7   13.7  1.79  1.76  10.7  16.6  1.00    4128.    3683.

Posterior (Bayesian) Visualization

Selection Bias in STRIDE Trial

  • Table 2 in STRIDE shows only 338/396 (85%) in the semaglutide group and 345/396 (87%) in the placebo group were analyzed for the primary outcome (Investigators 2024).
  • This means 15% of patients had missing outcome data.
  • If missingness is not random (e.g., related to tolerability or worsening condition), the observed effect size is biased.
  • This is particularly concerning given the subjective and effort-based nature of the walking test.

Sensitivity Simulation: Inflated Treatment from Selective Missingness

Suppose we randomly remove 15% of the data, and the missingness is biased toward the treatment group 
For example if the top 15% of placebo performers were lost and the poorest 15% of treatment group 
This will inflate the treatment effect.

[1] "Remember the unbiased (but faulty) analysis was: 19.6"
[1] "The biased analysis was: 35.6"

STRIDE Risk of Bias

Final Discussion Slide: Key Take-Home Points

  • Naive change score analyses are vulnerable to regression to the mean, especially when measurement error exists.
  • The STRIDE trial’s analysis may overestimate the treatment benefit due to this bias
  • The trial also suffers from ~15% missing data, without clear methods to handle this — risking selection bias.
  • ANCOVA and Bayesian ANCOVA correct for these issues and offer a more reliable estimate of treatment effect.
  • Clinicians should be wary of simple pre-post comparisons and always ask: “Was the analysis adjusted for baseline?”
  • Sponsored trial also vulnerable to over-estimation of effect sizes
  • Unblinded trials are particularly vulnerable to bias with over-estimation of effect sizes

DapaTAV Trial Overview

Conclusion: Trial of SGLT-2 inhibitors in 1257 TAVI patients undergoing TAVI reported significantly lower incidence of death from any cause or worsening of heart failure than standard care alone.

Justification: SGLT2i reduce the HF risk of heart-failure but valvular patients have been excluded from randomized trials.

DapaTAV Trial Results

References

Investigators, STRIDE Trial. 2024. “Semaglutide in Patients with Peripheral Artery Disease and Claudication (STRIDE): A Randomised, Double-Blind, Placebo-Controlled, Phase 2 Trial.” The Lancet. https://doi.org/10.1016/S0140-6736(24)00000-0.
Vickers, A. J., and D. G. Altman. 2001. “Statistics Notes: Analysing Controlled Trials with Baseline and Follow up Measurements.” Journal Article. BMJ 323 (7321): 1123–24. https://doi.org/10.1136/bmj.323.7321.1123.